Push local pool memory to a global list at thread termination #1621

Praetonus · 2017-03-02T08:40:10Z

This change adds a global list of free blocks to the pool allocator, similar to the existing global list of free lists. When a runtime thread terminates, it will push its local free blocks to the global list, which can then be retreived by other threads.

The goal of this change is to avoid memory leaks when starting and stopping the runtime multiple times in a program, for example in the compiler test suite.

This pool allocator cleanup functionnality is exposed to user threads through the new pony_unregister_thread function.

PS: I expect that this change will introduce a performance penalty to the runtime termination, especially for long-running programs with lots of memory in free lists. I haven't done any real measurements so I don't have a precise estimate for this slowdown. If that's an issue I can modify the patch to add a runtime option for pool cleanup.

sylvanc · 2017-03-02T08:56:55Z

src/common/pony/detail/atomics.h

  {
-    PONY_ABA_PROTECTED(T) ret = {NULL, 0};
+    PONY_ABA_PROTECTED_PTR(T) ret = {NULL, 0};


The interlocked exchange seems wrong. Such a heavyweight operation for a load? I can't see anywhere where that would be necessary?

Similarly, big_store use a CAS where a store should be sufficient.

This is because x86 doesn't have plain plain load/store operations on 16 bytes. The only instruction that can work on an operand of that size is cmpxchg16b and if we don't use that and use 2 plain mov instead, there would be a time window where a thread could modify one part of the value while another thread is still processing the other part.
For reference, both GCC and Clang also generate cmpxchg16b for atomic loads/stores of 16 bytes objects.

I've tried to reduce the overhead introduced by these operations by using non-atomic loads/stores whenever possible. For example, pool.c:465. Also, the fast bailout path of both pool_pull and pool_block_pull doesn't use any big atomic or hardware synchronisation.

So what I'm thinking is, that we are using bigatomic_load to "initialise" a 16 byte ABA protected pointer before a CAS operation. If we load the two 8 byte chunks independently (cheaply!), we will either get a matched pair (which may succeed during the CAS, and everything is fine) or an unmatched pair.

If we get an unmatched pair, the loads could be in either order. If we read a stale ABA and a current pointer, we will do a second loop on the CAS - no more expensive than the initial atomic read. If we read a current ABA and a stale pointer... same result!

The only way to provoke an error would be if 2^64 writes happen between the ABA read and the ptr read, and the 2^64 th write writes back the old pointer.

Is there another error condition that I'm missing?

That would work, yes. I think it will only be possible in pool_push/pull and not in pool_block_pull/push because in the latter ones the CAS can be on any element of the list and if we fail, we always retry from the start of the list. I'll update the PR.

It turns out it is also possible in pool_block_push/pull but with an additional branching to avoid the CAS when we don't need it. Would that be an interesting tradeoff?

After thinking about it, these functions won't be used really often so it probably won't make a difference. I'll include it in the change.

sylvanc · 2017-03-02T09:00:11Z

src/libponyrt/asio/epoll.c

@@ -239,6 +239,7 @@ DECLARE_THREAD_FN(ponyint_asio_backend_dispatch)
  close(b->wakeup);
  ponyint_messageq_destroy(&b->q);
  POOL_FREE(asio_backend_t, b);
+  pony_unregister_thread();


I like the explicit unregister approach.

@sylvanc

This is a change originally suggested by @sylvanc in ponylang#1621. On x86, this results in less occurrences of the expensive cmpxchg16b instruction.

@sylvanc

This is a change originally suggested by @sylvanc in #1621. On x86, this results in less occurrences of the expensive cmpxchg16b instruction.

sylvanc · 2017-03-03T19:21:52Z

src/libponyrt/sched/mpmcq.c

@@ -53,7 +53,7 @@ void ponyint_mpmcq_destroy(mpmcq_t* q)
 {
  atomic_store_explicit(&q->head, NULL, memory_order_relaxed);
 #ifdef PLATFORM_IS_X86
-  PONY_ABA_PROTECTED(mpmcq_node_t*) tail = bigatomic_load_explicit(&q->tail,
+  PONY_ABA_PROTECTED_PTR(mpmcq_node_t) tail = bigatomic_load_explicit(&q->tail,


bigatomic_load_explicit isn't needed here either. Only the .object is used.

sylvanc · 2017-03-03T19:22:51Z

src/libponyrt/sched/mpmcq.c

@@ -40,7 +40,7 @@ void ponyint_mpmcq_init(mpmcq_t* q)

  atomic_store_explicit(&q->head, node, memory_order_relaxed);
 #ifdef PLATFORM_IS_X86
-  PONY_ABA_PROTECTED(mpmcq_node_t*) tail;
+  PONY_ABA_PROTECTED_PTR(mpmcq_node_t) tail;


No need for bigatomic_store_explicit here. There is no concurrency issue when initialising the queue.

sylvanc · 2017-03-03T19:24:59Z

Sorry @Praetonus , I just noticed two more bigatomic ops we don't need.

This change adds a global list of free blocks to the pool allocator, similar to the existing global list of free lists. When a runtime thread terminates, it will push its local free blocks to the global list, which can then be retreived by other threads. The goal of this change is to avoid memory leaks when starting and stopping the runtime multiple times in a program, for example in the compiler test suite. This pool allocator cleanup functionality is exposed to user threads through the new pony_unregister_thread function.

Praetonus · 2017-03-03T19:41:03Z

I've updated the PR.

sylvanc · 2017-03-04T09:39:53Z

Excellent, thanks @Praetonus !

Praetonus mentioned this pull request Mar 2, 2017

Fix message sending of tuples sent through interfaces #1614

Merged

Praetonus force-pushed the pool-cleanup-block branch 2 times, most recently from 0d95e6a to 7b13522 Compare March 2, 2017 09:28

sylvanc reviewed Mar 2, 2017

View reviewed changes

Praetonus force-pushed the pool-cleanup-block branch 2 times, most recently from d6ef876 to ba28fee Compare March 2, 2017 17:34

Praetonus mentioned this pull request Mar 2, 2017

Load the initial tail non-atomically in ponyint_mpmcq_pop #1624

Merged

sylvanc pushed a commit that referenced this pull request Mar 3, 2017

Load the initial tail non-atomically in ponyint_mpmcq_pop (#1624)

b7299f6

This is a change originally suggested by @sylvanc in #1621. On x86, this results in less occurrences of the expensive cmpxchg16b instruction.

Praetonus mentioned this pull request Mar 3, 2017

Call finalisers for embedded fields when parent type has no finalizer. #1629

Merged

sylvanc suggested changes Mar 3, 2017

View reviewed changes

Praetonus force-pushed the pool-cleanup-block branch from 6b35cb0 to 9862796 Compare March 3, 2017 19:39

sylvanc approved these changes Mar 4, 2017

View reviewed changes

sylvanc merged commit 43a6346 into ponylang:master Mar 4, 2017

Praetonus mentioned this pull request Mar 9, 2017

Change to dynamic memory allocation/commit on windows #1647

Closed

Praetonus deleted the pool-cleanup-block branch March 14, 2017 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push local pool memory to a global list at thread termination #1621

Push local pool memory to a global list at thread termination #1621

Praetonus commented Mar 2, 2017

sylvanc Mar 2, 2017

sylvanc Mar 2, 2017

Praetonus Mar 2, 2017

sylvanc Mar 2, 2017

Praetonus Mar 2, 2017

Praetonus Mar 2, 2017

Praetonus Mar 2, 2017

sylvanc Mar 2, 2017

sylvanc Mar 3, 2017

sylvanc Mar 3, 2017

sylvanc commented Mar 3, 2017

Praetonus commented Mar 3, 2017

sylvanc commented Mar 4, 2017

Push local pool memory to a global list at thread termination #1621

Push local pool memory to a global list at thread termination #1621

Conversation

Praetonus commented Mar 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylvanc commented Mar 3, 2017

Praetonus commented Mar 3, 2017

sylvanc commented Mar 4, 2017